Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study

نویسندگان

  • Jerome P. Reiter
  • J. P. Reiter
چکیده

The paper presents an illustration and empirical study of releasing multiply imputed, fully synthetic public use microdata. Simulations based on data from the US Current Population Survey are used to evaluate the potential validity of inferences based on fully synthetic data for a variety of descriptive and analytic estimands, to assess the degree of protection of confidentiality that is afforded by fully synthetic data and to illustrate the specification of synthetic data imputation models. Benefits and limitations of releasing fully synthetic data sets are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Signi cance tests for multi-component estimands from multiply imputed, synthetic microdata

To limit the risks of disclosures when releasing data to the public, it has been suggested that statistical agencies release multiply imputed, synthetic microdata. For example, the released microdata can be fully synthetic, comprising random samples of units from the sampling frame with simulated values of variables. Or, the released microdata can be partially synthetic, comprising the units or...

متن کامل

Sampling with Synthesis: A New Approach for Releasing Public Use Census Microdata

Many statistical agencies disseminate samples of census microdata, i.e., data on individual records, to the public. Before releasing the microdata, agencies typically alter identifying or sensitive values to protect data subjects’ confidentiality, for example by coarsening, perturbing, or swapping data. These standard disclosure limitation techniques distort relationships and distributional fea...

متن کامل

Combining Methods to Create Synthetic Microdata: Quantile Regression, Hot Deck, and Rank Swapping

Government agencies must simultaneously disseminate useful microdata and maintain confidentiality of individual records. Releasing synthetic data is one approach. We propose to create synthetic data using a combination of quantile regression, hot deck imputation, and rank swapping. The result is a releasable data set containing original values for a few key variables, synthetic quantile regress...

متن کامل

Distribution-Preserving Statistical Disclosure Limitation1

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con…dential data replaced by multiply-imputed synthetic values. A mis-speci…ed imputation model can invalidate inferences based on the partially synthetic data, because the imputation model determines the distribution of s...

متن کامل

Distribution-preserving statistical disclosure limitation

One approach to limiting disclosure risk in public-use microdata is to release multiply-imputed, partially synthetic data sets. These are data on actual respondents, but with con…dential data replaced by multiply-imputed synthetic values. A mis-speci…ed imputation model can invalidate inferences because the distribution of synthetic data is completely determined by the model used to generate th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004